-
Notifications
You must be signed in to change notification settings - Fork 953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a script to compute the perplexity of test data #56
base: master
Are you sure you want to change the base?
Conversation
Adding eval.py and updates to util.py and models.py to allow for calculating the perplexity of test files. I also modified the vocabulary to have start, end and unknown character tokens.
count_pairs = sorted(counter.items(), key=lambda x: -x[1]) | ||
self.chars, _ = zip(*count_pairs) | ||
self.vocab_size = len(self.chars) | ||
self.vocab = dict(zip(self.chars, range(len(self.chars)))) | ||
with open(vocab_file, 'wb') as f: | ||
cPickle.dump(self.chars, f) | ||
self.tensor = np.array(list(map(self.vocab.get, data))) | ||
self.tensor = np.array(list(map(self.vocab.get, ['<S>'] + list(data) + ['</S>']))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be a better idea to write this after line 59, self.tensor = self.tensor[:self.num_batches * self.batch_size * self.seq_length]
, since it's unlikely that you will get the </S>
character
@@ -58,6 +58,29 @@ def loop(prev, _): | |||
optimizer = tf.train.AdamOptimizer(self.lr) | |||
self.train_op = optimizer.apply_gradients(zip(grads, tvars)) | |||
|
|||
def eval(self, sess, chars, vocab, text): | |||
batch_size = 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seq_length
you mean?
@@ -58,6 +58,29 @@ def loop(prev, _): | |||
optimizer = tf.train.AdamOptimizer(self.lr) | |||
self.train_op = optimizer.apply_gradients(zip(grads, tvars)) | |||
|
|||
def eval(self, sess, chars, vocab, text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to move this to eval.py
@ajaech This PR has merge conflicts. |
The eval.py script can be used to compute perplexity of test data.